Skip to content

mcp: handle oauth2.RetrieveError in client authorization retry logic#909

Open
smlx wants to merge 1 commit intomodelcontextprotocol:mainfrom
smlx:retry-retrieve-error
Open

mcp: handle oauth2.RetrieveError in client authorization retry logic#909
smlx wants to merge 1 commit intomodelcontextprotocol:mainfrom
smlx:retry-retrieve-error

Conversation

@smlx
Copy link
Copy Markdown

@smlx smlx commented Apr 25, 2026

Previously, an expired refresh token in the oauth2.Token returned from OAuthHandler.TokenSource() would cause the connection to fail.

From the client perspective, this meant that the MCP connection was in a hard-failed state with no way to re-authorize.

The change in this commit causes Authorize() to be called in the event of both an oauth2.RetrieveError, as well as in the pre-existing case of a 401/403 HTTP response. Clients will handle this in their existing Authorize() flows to get a new valid token for the connection.

Previously, an expired refresh token in the oauth2.Token returned from
OAuthHandler.TokenSource() would cause the connection to fail.

From the client perspective, this meant that the MCP connection was in a
hard-failed state with no way to re-authorize.

The change in this commit causes Authorize() to be called in the event
of both an oauth2.RetrieveError, as well as in the pre-existing case of
a 401/403 HTTP response. Clients will handle this in their existing
Authorize() flows to get a new valid token for the connection.
@maciej-kisiel
Copy link
Copy Markdown
Contributor

Could you explain the user journey in more detail, preferably as a list of events? Is it about a situation where the token from authorization was not used for a long time (maybe written to and loaded from disk after a significant time) so that the refresh token is no longer valid?

@smlx
Copy link
Copy Markdown
Author

smlx commented Apr 28, 2026

Is it about a situation where the token from authorization was not used for a long time (maybe written to and loaded from disk after a significant time) so that the refresh token is no longer valid?

Yes, exactly.

Could you explain the user journey in more detail, preferably as a list of events?

  1. The user authorizes the application.
  2. The resulting oauth.Token (including the refresh token) is retained either to persistent storage or in memory.
  3. Time passes. The user does not use the application for an extended period, or the authorization server revokes the refresh token. The refresh token becomes invalid.
  4. The user interacts with the application and the SDK attempts to make a request to the MCP server.
  5. OAuthHandler.TokenSource.Token() is invoked. The underlying oauth2.Config.TokenSource().Token() attempts to refresh the token by hitting the token endpoint. It responds with a 4xx.
  6. Token() returns a RetrieveError because of the invalid refresh token.

Events 5 and 6 happen here:

go-sdk/mcp/streamable.go

Lines 1910 to 1923 in 93a41b2

func (c *streamableClientConn) setMCPHeaders(req *http.Request) error {
c.mu.Lock()
defer c.mu.Unlock()
if c.oauthHandler != nil {
ts, err := c.oauthHandler.TokenSource(c.ctx)
if err != nil {
return err
}
if ts != nil {
token, err := ts.Token()
if err != nil {
return err
}

Before this change, the RetrieveError would bubble up to Connection.write, setting writeErr on the connection thus permanently setting it into a failed state. For a client using the connection, this is an unrecoverable error (I think? Maybe there's another way to recover here?).

After this change, the new doWithAuth wrapper intercepts the oauth2.RetrieveError. It pauses the request and invokes OAuthHandler.Authorize() to start a fresh authorization flow:

  1. The user is prompted to re-authorize using the same Authorize() flow they initially used.
  2. Once the user successfully re-authorizes, the original request is retried with the new valid token

Comment thread mcp/streamable.go

var authResp *http.Response
var retrieveErr *oauth2.RetrieveError
if err != nil && errors.As(err, &retrieveErr) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the check is too broad. Should we only be re-authorizing if retriveErr.ErrorCode == "invalid_grant"?

The provided authorization grant (e.g., authorization code, resource owner credentials) or refresh token is invalid, expired, revoked, does not match the redirection URI used in the authorization request, or was issued to another client.

https://datatracker.ietf.org/doc/html/rfc6749#section-5.2

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think you are right. I'll fix this up.

Comment thread mcp/streamable.go
//
// doRequest should construct and send the HTTP request, and return the sent request (which
// may be needed for authorization), the response (if any), and any error.
func (c *streamableClientConn) doWithAuth(ctx context.Context, requestSummary string, doRequest func() (*http.Request, *http.Response, error)) (*http.Request, *http.Response, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's make doRequest take context.Context to avoid unintentional wrong context capture

Comment thread mcp/streamable.go
return req, resp, err
}

if authErr := c.oauthHandler.Authorize(ctx, req, authResp); authErr != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into Authorize it relies on authResp to have WWW-Authenticate set. if fallbacks to well-known won't work the whole Authorize will fail.

might be a bit hacky, but what if we handle oauth2.RetrieveError earlier by just not setting "Authorization" header and letting request fail with 401/403?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into Authorize it relies on authResp to have WWW-Authenticate set. if fallbacks to well-known won't work the whole Authorize will fail.

I see what you are saying, but how likely/possible is this? Seems to defeat the purpose of being "well-known" 😕

might be a bit hacky, but what if we handle oauth2.RetrieveError earlier by just not setting "Authorization" header and letting request fail with 401/403?

That is an interesting idea. In fact, it might be a better solution than this PR. I'll look into it.

Copy link
Copy Markdown
Member

@yarolegovich yarolegovich Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying, but how likely/possible is this? Seems to defeat the purpose of being "well-known"

I don't know, but it is possible. The spec says:

MCP servers MUST implement one of the following discovery mechanisms to provide authorization server location information to MCP clients:

  • WWW-Authenticate Header: ...
  • Well-Known URI: ...

So the failed empty-auth request approach should be more reliable. It's also a bit easier to reason about in the sense that Authorize only receives MCP (not auth) server response.

@maciej-kisiel and I also considered how oauth.Config which token refresher already has can be passed to Authorize for skipping the discovery step in re-authorization, but all the options seem not worth it (in terms of code complexity) if a single failed request can handle this edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants